N 93-12386 

A Hybrid Approach to Software Repository Retrieval: 

Blending Faceted Classification and Type Signatures 


D. Eichmann 


Dept, of Statistics and Computer Science 
West Virginia University 
Morgantown, WV 26506 
email: eichmann@a.cs. wvu.wvnet.edu 


Abstract 

We present a user interface for a software reuse repost - 
lory that relies both on the informal semantics of faceted 
classification and the formal semantics of type signa- 
tures for abstract data types. The result is an interface 
providing both structural and qualitative feedback to a 
software reuser . 

1. Introduction 

The importance of software reusability as a subdis- 
cipline of software engineering is easily demonstrated 
by recent publications in the area, including a substan- 
tial two-volume collection edited by Biggerstaff and 
Perlis [2,3], as well as the earlier collection edited by 
Tracz [18]. 

Our current research focuses on composition-based 
reuse rather than generation-based reuse [4], since we 
feel that this is an area that promises the best short term 
results. As repositories increase in size and the compo- 
nents contained within them increase in complexity, in- 
creasing demands are placed upon the reuser, and 
thereby the retrieval mechanism, to discriminate be- 
tween large numbers of candidate components. This 
paper discusses one such retrieval mechanism. 

2. Background 

This work draws from three areas of previous work: fac- 
eted classification, algebraic specification, and type in- 
ference. 

2.1. Faceted Classification 

Faceted classification was first proposed as a retrieval 
mechanism by Prieto-Diaz [14], and subsequently used 
in at least two repository efforts [1, 10]. The technique 
is founded upon the notion of literary warrant. 

2, Lh Literary Warrant 

Literary warrant is a technique used in library science 
for the classification of texts [19]. Representative sam- 


ples of works generate a set of descriptive terms subse- 
quently organized for use in clustering the set of works 
as a whole. 

2.1.2, Conceptual C lossn & ss 

The vocabulary of terms built up through literary war- 
rant typically contains a great deal of semantic overlap, 
words whose meanings are the same -or at leas: similar. 
For instance, two components, one implementing a 
stack and the other a queue might both be characterized 
with the word insert corresponding to push and en- 
queue, respectively. 

Synonym ambiguity is commonly resolved through the 
construction of a restricted vocabulary, tightly con- 
trolled by the repository administrators. Repository us- 
ers must leam this restricted vocabulary, or rely upon 
the assistance of consultants already familiar v ith it. It 
is rarely the case, however, that the choice is between 
two synonyms. More typically it is between words 
which have similar, but distinct meanings. 

The words in the two pairs (insert, push) and (insert, en- 
queue) are conceptually close, that is, they both are 
plausible characterizations of one of the operations for 
each of their respective components, and yet they have 
distinct definitions in normal English usage. This fur- 
ther leads to the notion that the word pair (push, en- 
queue) should similarly be conceptually close, if only 
transitively through the common word insert 

Attaching a weight from the interval (0,1) supports a 
closeness metric for word pairs, and additionally sup- 
ports transitive weights as the product of the weights in- 
volved. For example, we might associate a weight of .8 
to the pair (insert, push) and .9 to the pair (insert en- 
queue), and thus a weight of .8 * .9 = .72 to the pair 
(push, enqueue). 

Note that transitive closeness of conceptually close 
pairs results in a conceptually close pair, and transitive 
closeness of distant pairs results in an even more distant 
pair. Thus, the choice of the weights is critically impor- 


2 36 Third International Conference on Software 

Engineering and Knowledge Engineering , Skokie, IL, 
June 27-29, 1991, pages 236-240. 


PRECEDING PAGE BLANK NOT FILMED 


/ 


k 


tarn to the success and utility of a user interface incorpo- 
rating conceptual closeness, 

2. 1J. Lattice-based Faceted Classification 

Eichmann and Atkins [6] described an approach to fac- 
eted classification that focused upon a structural frame- 
work (type lattices) as an alternative to explicit close- 
ness weights. Each component possessed one or more 
tuples characterizing it, each comprised of a non-empty 
set of facet values. Users posed queries as tuples, and 
reuse candidates were retrieved based upon their con- 
formance to the query tuple. 

2.2. Type Signatures 

An algebraic specification contains both a syntactic 
characterization of a component (the signature) and a 
semantic characterization of a component (the axioms). 
Algebraic specifications therefore are aptly suited as 
formal descriptions of software components. 

Traditional efforts in reuse concentrated on the struc- 
tural interfaces between components [1,2], and hence 
solely on the signature portion of the specification. This 
proved less than adequate for component discrimina- 
tion, in the face of numerous candidate components, all 
with the same interface, and directly prompted the work 
in faceted classification described above. 

2.3. Type Inference 

Recent research in programming language has resulted 
in a number of languages that are strongly typed, and 
yet, are flexible and remarkable expressive, (e.g., ML 
[13]). Such languages rely heavily on inferential 
mechanisms to ensure safe computation [5, 12], The 
concept of conformance is particularly relevant to soft- 
ware repository query mechanisms [1 1], Conformance 
allows one type instance to be treated as if it were an in- 
stance of another type, and can hold for arbitrary types, 
regardless of the type ordering scheme (e.g., inheri- 
tance). 

Type inference notation organizes around a set of infer- 
ence rules, comprised of sets of premises and conclu- 
sions, separated by a horizontal line. The symbol A rep- 
resents an existing set of assumptions. A always con- 
tains the type information generated by the database 
schema implementing the repository. A.x denotes the 
set of assumptions extended with some fact x. A f x 
states that given a set of assumptions A, and the cur- 
rently defined set of inference rules, x can be inferred. 
An expression is well-typed if a type for the expression 


can be deduced using the available inference rules, oth- 
erwise it is ill-typed. 

3. A Hybrid Approach 

The approach advocated here combines the semantic 
flexibility of faceted classification with the structural 
formality of type signatures. We accomplish this 
through the incorporation of function and abstract data 
type (ADT) definitions into the type lattice of [6]. 

3.1. The Type Lattice 

As shown in figure 1 , there are four principle sublaaices 
comprising the complete type lattice, corresponding to 
the types generated by facet sets, tuples, functions and 
ADTs. In addition, the universal type, T, and the void 
type, ensure that a least upper bound and a greatest 
lower bound, respectively, exist for any two types in the 
lattice. The usual built-in types (e.g., integers, strings, 
etc.) are not shown, in order to simplify the presenta- 
tion. In principle, they can be specified as ADTs if 
needed. 
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Figure 1. 


Faceto characterizes the empty generic facet type; it 
contains no values, but is still a facet. Likewise, Facet 
characterizes thejet of all possible facet values. The 
dotted Une indicates an arbitrary number of intermedi- 
ate types. 

The tuple sublattice has a similar structure. At the top is 
the empty tuple type, {), characterizing a type with no 
components. At the bottom is Tuple, the tuple type with 
all possible components. 

Function types are bounded above by JL-» T, the func- 
tion type with a void domain and universal range, and 
are bounded below by T —> X, the function type with a 
universal domain and void range. 
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ADT types are bounded above by Be.e, the abstract type 
denoting a a hidden type, e, with no information or op- 
erations available, and are bounded below by ADT, the 
type denoting all possible types with all possible opera- 
tions. 


3.2. Inference Rules 

3.3. Facets 

As in [6], we characterize facets as the inverse of our 
usual notion of interval subtypes; a facet subtype de- 
notes a larger collection of facet values than does its su- 
pertype. Inference rule (1) formalizes this for a com- 
plete facet 


AMI t(m...n) 

Inference rule (2) does likewise for two singleton inter- 
vals, and inference rule (3) for two arbitrary collections 
of intervals. 


A I- m e t 
A i- m' e t 
A f- n e t 
AM'u 

A >■ m' < m < n < n' 

A t( m '...n') ^ t(m...n) 

A f l<nji...ni)i t<mi'...ni‘) 

A [ t y< t(m,-,.rO 

A t(mi...ni. ...re, . ..n, M ...n,} 

A number of inference rules no presented here address 
the reduction and manipulation of intervals (6). 

3.3.1. Tuples 

We view a tuple r to be of type record, {ai : ti ..... a. : u} , 
whereattributeaisoftypeu.Weassumethattiissomefacet, 
function, orADTtype.Sinceattributesarelabeled, compo- 
nentsmayappearinanyorder.andtwotypesareassumedto 
be equivalentif they onlydifferintheorderofthe irrespec- 
tive attributes. 

Inference rule (4) characterizes subtyping for tuples. 
Informally, one tuple type is a subtype of another if it 
has all of the attributes of the other (and possible more), 
and for those common attributes, the type of a given at- 
tribute in the tuple subtype must be a subtype of that at- 
tribute’s type in the tuple supertype. 


( 2 ) 


(3) 


AH<m<n 
A t- t'i < ti 

AMV< t m W 

A ^ {ll • t 1, ... f im • l in* in * tn) 

- {i • tit • im * tjn) 

Inference rules (5) and (6) support definition of tuple 
constants and extraction of an attribute value, respec- 
tively. 

A H ei = ti 

A t ~ e r= t n (5) 

A.(r= {i, = ei, .... i n = e*}) 

1* r . (ii . tj, .... i n ; in) 

A M: {ii : : tn) 

A h 1 ^ j ^ n (6) 

A 1- r.ij ; tj 

3J.2. Functions 

Function types are useful both for characterizing pro- 
grams and for characterizing the operations of ADTs. 
Inference rule (7) characterizes the usual notion of 
lambda abstraction, where x is the parameter, t’ the pa- 
rameter’s type, e is the body of the function, and t the 
type of the function’s result. 

A, x : t' t- e : t 

A H(x : t') e : (t' -»t) (7) 

One function type, s — » t, is a subtype of another, s' — » t', 
if the subtype function accepts the entire domain of the 
function supertype (i.e., s' - s), and produces a range 
contained in the supenype range (i.e., t ^ t'),as shown in 
inference rule (8). 

A MM s 

AM <t' (8) 

AM— »t<S'— »t' 

Function subtyping seems a little strange at first, but a 
simple example helps. Assume that f is a function type 
(1-4) -» true and g is a function type (2. .3) -> 
(true..false). Function type f is a subtype of g. Any in- 
stance of f can always replace an instance of g in an ex- 
pression without effecting the type-safety of the ex- 
pression. The instance of f handles at least the values 
the supertype function does, and produces no more val- 
ues than does the supenype function. 

Inference rule (9) characterizes the type of the result of a 
function application; if the expression supplied as an ar- 
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gumcnt is of the proper type, then the result of the func- 
tion applied to that expression will be well-typed. 

A fe:(i'-M) 

A H e' : t ' ( 9 ) 

A f e (eO : t 

13.3. APTs 


stances and operations, T would be a subtype of T' 
through additional operations. An example of this ap- ^ 
peared in [17], showing stacks and dequeues as sub- 
types of queues. 

For an example of the latter, stack of integcr ( i.io) is a W 
subtype of stack of integer. 


Inference rules (10) and (11) define type inference for 
existential types [4], An existential type consists of a 
type variable a, representing the type, packaged with 
some number (ji ... j») of instances of the type and/or 
operations over the type. 

A f e> : Si [</, 

: 00) 

A f Cn I Snjj/i 

A t pack (a = t in (ji : si j„ : s„)) 

(e>. •••. e„) : 3a.(ji : si j„ : s n ) 

A i- e : 3b.Gi : Si, .... j„ : s n ) 

A(x : (ji • si j n : s n ))|tA> l e' : t (H) 

A f open e as x [a] in e' : t 

A given expression t, is of type s. when t is substituted 
for a in s., and serves as the implementation of the value 
or operation labeled j, in the abstract type. This substitu- 
tion results in a concrete type (i.e., one with no type vari- 
ables in it) for the expression. The substitution type t 
serves as the representation of the abstract type, denoted 
externally by the existential variable a. The actual rep- 
resentation and the implementations of the operations 
are not visible externally. 

The pack operation constructs an instance of an abstract 
type, and encapsulates its representation. The open op- 
eration performs the converse, binding an abstract type 
variable to a concrete type, and evaluating some expres- 
sion in the context of the (now concrete) abstract type. 

Subtyping of ADTs derives from subtyping of the type 
parameters for the abstract type. Inference rule (12) 
characterizes subtyping of two instances of abstract 
types. 

A.(ti <t 2 ) l (t <Q 

A f(3(t| <i 2 ).t) < (3(t, < t2 ).t') (12) 

Note that in addition to providing subtyping of two 
ADTs, rule (12) also supports subtyping of two in- 
stances of the same ADT. 

For an example of the former, 3T 3(J<T)T' denotes 
an existential type T" generated by a type parameter T, 
which must be a subtype of the existential type T. Since 
instances of abstract types are cross products of in- 


4, The User Interface 

A query is a boolean expression containing predicates 
and the operators and, or, and not. A predicate is simply 
a constant of type tuple. When a user issues a query, the 
query evaluator first treats all of the facet values in the 
query as synonyms and replaces them with actual facet 
values from a valuc/synonym relation. For example, 
database , databases, data base , and data bases might 
all be replaced with database. 

The evaluator then locates all of the relations in the data- 
base whose type conforms to some predicate of the 
query by testing the type of each relation in turn, using 
the inference rules previously described. The query lat- 
tice space for a given predicate is bounded above by the 
predicate type itself, and bounded below by the partition 
tuples that conform to it. For each user-specified predi- 
cate, the evaluator forms the disjunction of conforming 
relation tuples (with variables in each position) and then 
substitutes the conjunction of the disjunction and the 
new predicate in place of the original, user-specified 
predicate. The result of evaluating this query is then a 
set of component references for display and optionally, 
retrieval from the text storage area. 

Note that since tuples of more than a single type may be 
displayed to the user, the query language is polymo- 
rphic in one of the manners discussed in f7j. 

5, Discussion 

The work described here is another in a series of experi- 
mental user interfaces for software reuse repositories. 
Our initial efforts concentrated specifically on provid- 
ing substructure for faceted classification [9]. This ap- 
proach relied only upon the expertise of the classifier in 
populating the repository, and as such, suffered from 
what we refer to as the vocabulary problem. 

The interface described here ameliorates the situation 
by supporting as part of the query tuple the specificia- 
tion of a formal interface structure to which the compo- 
nents of interest must conform. 
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A parallel effort exploring the role that algebraic speci- 
fication can play in repository retrieval appears in [8]. 

This work is concerned particularly with retrieval over 

type signatures and behavioral axioms. 

6. References 

[1] J. Atkins, private communication, 1989. 

[2] T. J. Biggerstaff and A, J. Perlis, Software Reus- 
ability, vol. 1 - Concepts and Models , Addison- 
Wesley, New York, NY, 1989. 

[3] T. J. Biggerstaff and A. J. Perlis, Software Reus- 
ability, vol. 2 - Applications and Experience , Ad- 
dison-Wesley, New York, NY, 1989. 

[4] T. J. Biggerstaff and C. Richter, “Reusability 
Framework, Assessment, and Directions,” IEEE 
Software , vol. 4, no. 2, pages4 1-49, March, 1987. 

[5] L. Cardelli, “Basic Polymorphic Typechecking,” 
Science of Computer Programming, vol. 8, pages 
147-172,1987. 

[6] L. Cardelli and P. Wegner, “On Understanding 
Types, Data Abstraction, and Polymorphism,” 
ACM Computing Surveys, vol. 17, no. 4, pages 
471-522, December 1985. 

[7] D. Eichmann, Polymorphic Extensions to the Re- 
lational Model, Ph.D. dissertation, Dept, of Com- 
puter Science, The University of Iowa, Iowa City, 
IA, August 1989. 

[8] D. Eichmann, "Selecting Reusable Components 
Using Algebraic Specifications," Second Interna- 
tional Conference on Algebraic Methodology and 
Software Technology (AMAST), Iowa City, IA, 
May 22-25, 1991. 

[9] D. Eichmann and J. Atkins, “Design of a Lattice- 
Based Faceted Classification System,” Second In- 
ternational Conference on Software Engineering 
and Knowledge Engineering, Skokie, IL, pages 
90-97, June 21-23, 1990. 


[10] E. Guerrieri, “On Classification Schemes and Re- 
usability Measurements for Reusable Software 
Components,” SofTech Technical Report IP-256, 
SofTech, Inc, Waltham, MA 1987. 

[11] C. Horn, “Conformance, Genericity, Inheritance 
and Enhancement,” ECOOP-87 - Proc. Euro- 
pean Conference on Object-Oriented Program- 
ming, Paris, France, pages 223-233, June 15-17, 
1987. 

[12] R. Milner, “A Theory of Type Polymorphism in 
Programming," Journal of Computer and System 
Sciences, vol. 17, pages 348-375, 1978. 

[13] R. Milner, M. Tofie, and R. Harper, The Definition 
of Standard ML, MIT Press, Cambridge, MA, 
1990. 

[14] R. Prieio-Diaz, A Software Classification 
Scheme, Ph.D. dissertation, Dept, of Information 
and Computer Science, University of California, 
Irvine, CA, 1985. 

[15] R. Prieto-Diaz and P. Freeman, “Classifying Soft- 
ware for Reusability,” IEEE Software , vol. 4, no. 
1, pages 6-16, January, 1987. 

[16] J. V. Gutiag and J. J. Homing, “The Algebraic 
Specification of Abstract Data Types,” Acta Infor- 
matica , vol. 10, pages 27-52, 1978. 

[17] A. Snyder, “Inheritance in the Development of 
Encapsulated Software Components,” Research 
Directions in Object-Oriented Programming, B. 
Shriver and P. Wegner, eds., MIT Press, 
Cambridge, MA, pages 165-188, 1987. 

[18] W. Tracz, ed., Tutorial, Software Reuse: Emerg- 
ing Technology , IEEE Computer Society Press, 
Los Angeles, CA, 1988. 

[19] B.C. Vickery, Facete d Classification: A Guide to 
Construction and Use of Special Schemes, Aslib, 
London, 1960. 


240 






